new point
Trustworthy Feature Importance Avoids Unrestricted Permutations
Borgonovo, Emanuele, Cappelli, Francesco, Lu, Xuefei, Plischke, Elmar, Rudin, Cynthia
Since their introduction by Breiman (2001), permutation-based feature importance measures have been widely adopted. However, randomly permuting the entries of a dataset may create new points far from the original data or even "impossible data." In a permuted dataset, we may find children who are retired or individuals who graduated from high school before they were born (Mase et al. 2022, p. 1). Forcing ML models to make predictions at these points causes them to extrapolate, making explanations unreliable (Hooker et al. 2021). Every non-trivial permutation-based variable importance measure, including SHAP (Lundberg and Lee 2017), Knockoffs (Barber and Candรฉs 2015), conditional model reliance (Fisher et al. 2019), and accumulated local effect (ALE) plots (Apley and Zhu 2020) suffer from this. We propose and compare three new strategies to address extrapolation issues. The first combines conditional model reliance from Fisher et al. (2019) with a Gaussian transformation. By mapping data quantiles to a Gaussian distribution and back, we adjust only the quantiles of point values, significantly reducing extrapolation. Under a Gaussian copula assumption for the feature distribution, we prove that the new data points follow the same probability distribution as the original data.
APPENDIX: In this section, we provide the details of our implementation and proofs for reproducibility
's hidden state by h Then we need to calculate the second part of Eq. Using the Bayes' theorem, we have: p In Section 4.3, we devise a Sigmoid function to adapt the ฮณ during the supernet training, which is defined as: ฮณ (t) = 1 Sigmoidnull ( t total epochs 2 1) b null, (19) Section 3.2 theoretically demonstrates the benefit of the proposed architecture complementation loss function,
Adaptive Replication Strategies in Trust-Region-Based Bayesian Optimization of Stochastic Functions
Binois, Mickael, Larson, Jeffrey
We develop and analyze a method for stochastic simulation optimization relying on Gaussian process models within a trust-region framework. We are interested in the case when the variance of the objective function is large. We propose to rely on replication and local modeling to cope with this high-throughput regime, where the number of evaluations may become large to get accurate results while still keeping good performance. We propose several schemes to encourage replication, from the choice of the acquisition function to setup evaluation costs. Compared with existing methods, our results indicate good scaling, in terms of both accuracy (several orders of magnitude better than existing methods) and speed (taking into account evaluation costs).
AugmentTRAJ: A framework for point-based trajectory data augmentation
Data augmentation has emerged as a powerful technique in machine learning, strengthening model robustness while mitigating overfitting and under-fitting issues by generating diverse synthetic data. Nevertheless, despite its success in other domains, data augmentation's potential remains largely untapped in mobility data analysis, primarily due to the intricate nature and unique format of trajectory data. Additionally, there is a lack of frameworks capable of point-wise data augmentation, which can reliably generate synthetic trajectories while preserving the inherent characteristics of the original data. To address these challenges, this research introduces AugmenTRAJ, an open-source Python3 framework designed explicitly for trajectory data augmentation. AugmenTRAJ offers a reliable and well-controlled approach for generating synthetic trajectories, thereby enabling the harnessing of data augmentation benefits in mobility analysis. This thesis presents a comprehensive overview of the methodologies employed in developing AugmenTRAJ and showcases the various data augmentation techniques available within the framework. AugmenTRAJ opens new possibilities for enhancing mobility data analysis models' performance and generalization capabilities by providing researchers with a practical and versatile tool for augmenting trajectory data, Its user-friendly implementation in Python3 facilitates easy integration into existing workflows, offering the community an accessible resource to leverage the full potential of data augmentation in trajectory-based applications.
i-Octree: A Fast, Lightweight, and Dynamic Octree for Proximity Search
Zhu, Jun, Li, Hongyi, Wang, Shengjie, Wang, Zhepeng, Zhang, Tao
Establishing the correspondences between newly acquired points and historically accumulated data (i.e., map) through nearest neighbors search is crucial in numerous robotic applications.However, static tree data structures are inadequate to handle large and dynamically growing maps in real-time.To address this issue, we present the i-Octree, a dynamic octree data structure that supports both fast nearest neighbor search and real-time dynamic updates, such as point insertion, deletion, and on-tree down-sampling. The i-Octree is built upon a leaf-based octree and has two key features: a local spatially continuous storing strategy that allows for fast access to points while minimizing memory usage, and local on-tree updates that significantly reduce computation time compared to existing static or dynamic tree structures.The experiments show that i-Octree surpasses state-of-the-art methods by reducing run-time by over 50% on real-world open datasets.
Evolving Strategies for Competitive Multi-Agent Search
Bahceci, Erkin, Katila, Riitta, Miikkulainen, Risto
While evolutionary computation is well suited for automatic discovery in engineering, it can also be used to gain insight into how humans and organizations could perform more effectively. Using a real-world problem of innovation search in organizations as the motivating example, this article first formalizes human creative problem solving as competitive multi-agent search (CMAS). CMAS is different from existing single-agent and team search problems in that the agents interact through knowledge of other agents' searches and through the dynamic changes in the search landscape that result from these searches. The main hypothesis is that evolutionary computation can be used to discover effective strategies for CMAS; this hypothesis is verified in a series of experiments on the NK model, i.e.\ partially correlated and tunably rugged fitness landscapes. Different specialized strategies are evolved for each different competitive environment, and also general strategies that perform well across environments. These strategies are more effective and more complex than hand-designed strategies and a strategy based on traditional tree search. Using a novel spherical visualization of such landscapes, insight is gained about how successful strategies work, e.g.\ by tracking positive changes in the landscape. The article thus provides a possible framework for studying various human creative activities as competitive multi-agent search in the future.